\section{Introduction}
\label{section:introduction}
%The well functioning of computer networks depends on the efficient forwarding of packets as well as the effective management of networks. 
NetFlow\cite{claise_cisco_2004} is a widely used tool in network measurement and analysis.
It records traffic statistics in the form of flow records, where each record contains important 
information about a flow, for example, its source and destination IP addresses, 
start and end timestamps,  type of services, application ports, input and output ports, 
as well as the volume of packets or bytes, etc. 

A challenge in implementing NetFlow like tools is to keep up with the ultra high speed 
of network traffic, especially on high-bandwidth backbone links. For example, assuming an 
average packet size of 700 bytes, and a 40 Gbps link, the time budget for processing 
one packet is only around 50 nano-seconds\cite{zhang_more_2017}\cite{zhang_more_2015}\cite{wang_efficient_2019}.
 In an extreme case where packets of 40 bytes 
arrive at a speed of 100 Gbps, the time budget will be only a few nano-seconds. 
NetFlow also faces the high diversity of the traffic, where hundreds of thousands, 
even millions of concurrent flows appear in a measurement epoch. This pose stringent 
pressure on the scarce high-speed memories, such as on-chip SRAM with 1 $\sim$ 10 nano-seconds 
access delay\cite{li_flowradar:_2016}\cite{noauthor_access_nodate}.

One straightforward solution is to use sampling \cite{noauthor_sampled_nodate}, 
where out of several packets, only one of them gets processed and used to update the flow records.
However, sampling reduces processing overhead at the cost of fewer packets or flows being recorded, 
thus less accurate statistics that can be estimated. To remedy this, very enhanced sampling algorithms
\cite{hohn_inverting_2003}\cite{duffield_estimating_2005}\cite{tune_towards_2008}
have been proposed and tailored for specific measurement requirement, 
and their impact analyzed\cite{duffield2004}\cite{SamplingImpact}.
Another direction of solution is to use sketch (also referred as data streaming algorithms)
\cite{DataStreams2005}\cite{huang_sketchvisor:_2017}\cite{chen_counter_2017},
where a succinct data structure is designed and can be updated very efficiently. 
However, these sophisticated data structures and algorithms generally can only be used in limited scenarios, 
but not for the wide range of applications that the original NetFlow can support.

Towards accelerating flow record maintenance and achieving better statistics estimation, 
recently a few algorithms that make enhancement to a naive hash table and integrate sketches 
have been proposed, including OpenSketch\cite{yu_software_2013}, 
UnivMon\cite{liu_one_2016}, FlowRadar\cite{li_flowradar:_2016}, HashPipe\cite{sivaraman_heavy-hitter_2017},  
and ElasticSketch\cite{yang_elastic_2018}, etc. Both constant bound of the worst case delay and 
efficient utilization of memory are achieved, making them good candidates for general 
measurement applications in high-speed environment. 


%It allows the network operators to configure a parameter $N$ (e.g., $N=1000$). The network elements (switches or routers) will record a packet for every $N$ packets, and the estimated size of a flow is obtained by multiplying the recorded size of the flow by $N$. However, the method has several inherent shortcomings. Firstly, manually setting the sampling rate is hard for the network operators since it requires a deep understanding of the characteristics of the network traffic, and the current sampling rate may not be satisfiable and a better sampling rate is needed when network anomalies such as flooding attacks are present. Secondly, the small flows with only a small number of packets may be left out frequently, and flow size estimation for sampled flows often has large error. Finally, even if sampling is adopted in NetFlow, it remains to be an open question as how to access and update the flow records efficiently. Let's focus on the question of designing algorithms to allow efficient access and update of flow records since it is more fundamental than the configuration of sampling rate.


%Basically there are two challenges in designing the algorithms:

%\textbf{The time budget is small.} Note that nowadays the data rate in networks can be 40 Gbps or even 100 Gbps. Suppose the average packet size is 700 bytes (which is the average packet size of our trace files from campus networks as described in Section \ref{subsection:experimentsettings}), the processing time for a single packet may be as small as 56 ns. Notice that the network elements need to support many functions other than network measurement such as Layer 2/3 forwarding and ACLs which cost a substantial part of the time quotas, and a lot of ALU operations are necessary for the measurement function. Since the access time of on-chip SRAM is about 1-10 ns\cite{li_flowradar:_2016}\cite{noauthor_access_nodate}, only a few memory accesses are allowed to do the network measurement.

%\textbf{The memory budget is limited.} In a commercial data center (such as Facebook's data centers), each host may initiate 100s to 1000s concurrent flows\cite{roy_inside_2015}, so a ToR switch needs to handle 10K concurrent flows while a core switch needs to handle up to 100K flows concurrently.\footnote{$1K=10^3$ in this paper.} Suppose each flow record occupies 64 bytes of memory as recommended in \cite{cisco_configuring_2018} and SRAM is used to support high-speed forwarding. To accommodate 100K flow records at least 6.4MB of SRAM is needed. However, the size of available SRAM in the latest generation of switch ASICs is 50-100 MB only\cite{miao_silkroad:_2017}\cite{moshref_scream:_2015}, and other functions such as routing, scheduling and security need a substantial amount of SRAM.

%As sampling based NetFlow are widely used in today's networks, researchers have moved the focus to another direction recently, i.e., using hashing-based solutions to record the flow information, including sketches\cite{liu_one_2016}\cite{yu_software_2013} and flow arrays\cite{yang_elastic_2018}\cite{sivaraman_heavy-hitter_2017}. This type of methods has several advantages comparing with sampling methods. First of all, the time complexity for processing each packet is constant, which helps to provide predictability for the performance of network elements, and the memory requirement is limited. Moreover, when properly designed, the error rate of sketch is upper-bounded, and higher precision can be achieved.

%Theoretically, with unlimited time budget we can design a perfect algorithm (e.g., dichotomy) to access the flow records with no ancillary memory, or with unlimited memory budget we can design a perfect data structure (e.g., a perfect hash table) to access a flow record with only 1 memory access.

Following these efforts, we propose HashFlow, which makes a further step in squeezing memory consumption. 
The central idea of HashFlow is to  
maintain accurate records for elephant flows (i.e., flows with many packets), as well as summarized records for mice flows (i,e., flows with a few packets), 
by applying novel strategies of collision resolution and record promotion to hash tables. 
The collision resolution part eliminates collisions that may mix up packets 
from different flows, keeps a flow from being evicted until another flow with larger size collides with it, while fully utilizing the memory space by filling up nearly all hash table buckets.
On the other hand, the record promotion part allows the flows to grow in the summarized set, and bounces a flow back from the summarized set to the accurate set and replaces the original one which has smaller size when 
this flow becomes large enough. 
The performance bound can be analyzed with a probabilistic model, 
and with this strategy, HashFlow achieves a better utilization of space, 
and also more accurate flow records, without bringing extra complexity. 

We have implemented HashFlow, as well as several latest flow measurement algorithms mentioned above, 
including FlowRadar, HashPipe and ElasticSketch, in a P4-programmable \cite{bosshart_p4:_2014} software switch\cite{noauthor_bmv2:_2018}. To illustrate the implementability of HashFlow, we further implement it in a commodity P4 switch \cite{noauthor_barefoot_nodate} which has the type of Wedge 100BF-32X\cite{noauthor_edgecore_nodate}.
We then use traces from different operational networks to evaluate their effectiveness.
In these experiments, for various types of traffic analysis applications, 
HashFlow demonstrates a consistently better performance against 
its state-of-the-art competitors. 
For example, using a small memory of 1 MB, HashFlow can accurately record 
around 55K flows, which is often 12.5\% higher than the others.
For estimating the sizes of 50K flows, HashFlow achieves a relative error of around 11.6\%, 
while the estimation error of the best competitor is 42.9\% higher. 
It detects 96.1\% of the heavy hitters out of 250K flows with a size estimation error of 5.6\%, which is 11.3\% and 73.7\% better than  
the best competitor respectively.
At last, we show that these merits of HashFlow come with negligible degradation of throughput.

\iffalse
a solution based on hashing techniques for NetFlow in this paper, namely HashFlow. Our basic ideas in designing HashFlow are as follows:
\begin{itemize}
    \item Carefully sacrifice the time budget to improve the utilization of memory space by using multiple hash functions, so that we can accommodate more flow records at the expense of slightly more processing time.
    \item Record the elephant flows preferentially and drop the information about the mice flows when necessary to guarantee the accuracy of the recorded information since elephant flows are more important than mice flows for many applications such as heavy hitter detection and traffic engineering.
\end{itemize}

We implement HashFlow in P4\cite{bosshart_p4:_2014} and deploy it on bmv2\cite{noauthor_bmv2:_2018}. Our experimental results on bmv2 show that the average relative error of flow size estimation for heavy hitters is 36.5 times smaller, the average relative error of flow size estimation over all flows is 2.4 times smaller, and the precision for heavy hitter is improved by 19\% respectively, comparing to that of ElasticSketch\cite{yang_elastic_2018}, which is the state-of-the-art solution for network traffic measurement.
\fi

The remainder of the paper is organized as follows.
We introduce our motivation and central ideas in designing HashFlow in Section~\ref{section:background}. 
We present the algorithm details, as well as the theoretical analysis in Section~\ref{section:algorithmoverview}, and then present the implementation details in hardware P4 switch in Section~\ref{section:implementation}. 
Using real traffic traces, we analyze the parameters of HashFlow and compare it against other algorithms in Section~\ref{section:evaluation}.
% and discuss the related work in Section~\ref{section:relatedwork}. 
Finally we conclude the paper in Section~\ref{section:conclusion}.
